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siRNA Libraries Optimized for Predetermined Protein Families 

BACKGROUND OF THE INVENTION 
[00011 SmaU interfering RNAs (siKNA) are short double-stranded KNA fcagmaits 

that elicit a process known as RNA interference ORNAi), a form of sequence-specific geue 
silencing. Zamore, Phillip et al. Cell 101:25-33 (2000); Blbashir, Sayda M., et aU Nature 
41 1 :494-497 (2001). siKNAs are assembled into a multicomponent complex known as the 
RNA-induced sUencing complex (RISC). The siRNAs guide RISC to homologous mRNAs, 
targeting them for destruction. Hanmiond et a/., Naiute Genetics Reviews 2:110-119 (2000). 
RNAi has been observed in a variety of organisms including plants, insects and mammals. 
Since RNAi provides a means to specifically inhibit the expression of a gene by causmg the 
rapid degradation of the mRNA of the gene, much research is now being conducted to 
ascertam if it is possible to use KNAi as a therapeutic tool, le. as a means to target and 
selectively silence specific genes known to be involved in various disease processes. RNAi 
is also being used as a research tool in flie field of fimctional genomics, le. as a means for 
identifying and discovering hitherto unknown genes mvolved m disease processes, utilizing 
gene discovery techniques such as Inverse Genomics® which was developed by the Assignee 
hereof (see, e.g., WO 00/05415). 

[0002J Various m^ods are known for the production of expression cassettes capable 
of expressiBg a library of sflRNAs. In co-p«iding a?>plications assigned to the Assignee 
hereof (U.S. s Serial Nos. 10/628,587 andlO/626,512), there are described metiiods for the 
e3q>ression of siRNAs in which all or most of the siRNA nucleotide sequoice is fully 
randomized. For siKNAs having a length of 21 nucleotides, the fiiUy random siRNA Ubiary 
contains (4^')/2 or 2.2 x lo'^ unique members. A fibrary of such size ("compleidty^ is very 
useful for purposes of gene discovoy utilizing flie techniques of Inverse Genomics®, but 
ftrae are certain practical drawbadffi inherent in the use of a library of such complexity. 
Undra: certain drcumstances, using a library of such complexity may be unnecessary and 
even counter-productive. For exan^le, if it is desired to study flie effect of RNAi on a small 
number of genes known to encode a family of proteins, it would be preferable to express a 
more limited (less complex) library that comprises only the siRNA that silences lliese genes, 
rather than a totally randomized library of full complexity. Heretofore, it was impossible to 
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do SO, and the only alternative was to synthesize individually each and every siKNA of 
interest. 

[00031 The inventors hereof have now discovered a method for expressing a library 
of siKNAs wha:ein the library is optimized to include at least all siSNAs which fimctionally 
silence specific genes of mterest, e.g. genes which encode a predetermined fiEonily of proteins. 
This novel method is highly advantageous over other methods currently known or practiced 
in the art. It allows for the molecular cloning of the entire targeted library of siKNAs of 
interest in a single stq), thereby eliminating the relatively high cost and time-consumption 
involved in the synthesis of individual siRNAs. It also allows for the delivery of the siiOSfAs 
in a pooled fashion, making it possible to do combinatorial screening without need for more 
expensive robot-based high-throughput screenmg methods. In addition, it provides a high 
degree of flexibility in the design and expression of the library of interest, making it possible 
to modify easily the complexity of the library (/.e, increase or decrease its size) depending 
upon the goals of the research and the information that is available with respect to the genes 
or protein family of interest. Finally, since the siRNA libraries of the present invention are 
expressed by means of partially randomized gene sequences, they comprise not only siRNAs 
haviag the ability to silence genes mcoding all the known members of a protein fiamily of 
mterest but additional genes as well, thereby expanding the possibilities (via techniques such 
as Inverse Genomics®) for discovery of novel genes heretofore not known to express proteins 
belonging to the family of interest. 

BRIEF SUMMARY OF THE INVENTION 

[0004] The present invention provides an siRNA expression library for selective post- 
transcriptional silencing of genes encoding a family of proteins, wherein members of the 
library encode siRNA molecules that are of between 15 to 30 nucleotides in length and target 
at least all mRNAs encoding all known members of the femily of proteins. The library may 
comprise between 50 and one million unique members. In a preferred embodiment, the 
siRNA molecules are between 18 to 24 nucleotides in length. In yet another preferred 
embodiment, the family of proteins is any that is known to be involved in disease processes, 
such as G protem coupled recq>tors, ion channels, receptor tyrosine kinases, non-receptor 
tyrosine kmases, nuclear hormone receptors, GTPases, ATPases, serine/threonine kinases. 
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proteases, matrix metalloproteinases (MMPs), GTPase-activating protdns (GAPs), E3 
ubiquitin ligases, or others. 

[0005] The present invention also provides a method for generating an siRNA 
expression library for selective post-transcriptional silencing of genes encoding a family of 
proteins, the method comprising identifying a consensus sequence for the fimoily of proteins 
and generating an siRNA expression library whose memb^ encode siRNA molecules that 
target at least all mRNAs encoding all known members of the family of proteins. The 
consensus sequence may comprise between IS to 30 nucleotides, and preferably, between 18 
to 24 nucleotides. In one embodiment, the consensus sequence is determined after identifying 
at least one signature motif for the family of proteins. In another embodiment, two or more 
variants of a signature motif for the £anuly of proteins are identified, and a consensus 
sequence is determined for each of the variants. 

BRIEF DESCRIPTION OF THE DRAWING 

[0006] Figure 1 depicts an exemplary DNA expression cassette for expressing the 
siRNA from opposing pol HI promoters (U6 promoters shown) in accordance with the 
present invention. 

DEFINmONS 

[0007] The term "nucleic acid" or "polynucleotide" refers to deoxyribonucleic acids 
(DNA) or ribonucleic acids (RNA) and polymers thereof in either single- or double-stranded 
form. Unless specifically limited, the term encompasses nucleic acids containing known 
analogues of natural nucleotides that have similar binding properties as the reference nucleic 
acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless 
otherwise indicated, a particular nucleic acid sequence also implicitiy encompasses 
conservatively modified variants thereof (ag., degenerate codon substitutions), alleles, 
orthologs, SNPs, and con:q)lem6ntary sequences as well as the sequence explicitiy indicated. 
Specifically, degenerate codon substitutions may be achieved by generating sequences in 
which the third position of one or more selected (or all) codons is substituted with mixed- 
base and/or deoxyinosine residues (Batz©: et al. Nucleic Acid Res. 19:5081 (1991); Ohtsuka 
et al, /. Biol Chem. 260:2605-2608 (1985); and Rossolini et aU Mol Cell Probes 8:91-98 
(1994)). The term nucleic acid is used interchangeably with gene, cDNA, and mRNA 
encoded by a gene. 
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(00081 T*^® te™ "gene" or "ceUalar gene" refers to a nucleic acid fragment that 
encodes a specific transcription product; it includes regions preceding (5* non-coding) and 
follovdng (3' non-coding) the coding region that control transcriptional expression as well as 
intervening sequences (introns) between individual coding segments (exons). 
[00091 Thetenn"dsRNA,"ordouble"StrandedRNA,referstoanR^ 
comprising two hybridized complementary RNA strands in a double-stranded confomiation 
through base pairing interactions. The term "siRNA" refers to a dsRNA that is preferably 
between 15 and 30, and more preferably between 18 and 24 base pairs long, each strand of 
which can have a short 3' overhang. Functionally, the characteristic distinguishing an siRNA 
over other forms of dsRNA is that an siRNA is capable of specifically inhibiting expression 
of a gene by a process termed 'TRNA interference" (RNAi), and, due to their small size, do 
not induce m mammalian cells the interferon and PKR pathways that can lead to non-specific 
inhibition of gene egression. 

[0010] A "library" as used herein refers to a collection of nucleic acid sequences that 
possesses a common characteristic. For example, a library of nucleic acids can be 
representative of all possible configurations of a nucleic acid sequence over a defined Iragth. 
Altematively, a nucleic acid Ubrary may be a collection of sequences that represents a 
particular subset of the possible sequence configurations of a nucleic acid of a defined length. 
A library may also represent all or part of the genetic information of a particular organism. A 
nucleic acid "library" is typically, but not necessarily, cloned into a vector. 

[0011] An "siKNA expression library" of the invention is a nucleic acid library that 
is capable of generating a collection of siSNA molecules by a transcription process. 

[00121 "Polypeptide," "peptide," and "protein" are used interchangeably herein to 
refOT to a polymer of amino acid residues. All three terms apply to amino acid polymers in 
which one or more amino acid residues are an artificial chemical mimetic of a corresponding 
naturally occurring amino acid, as well as to naturally occurring amino acid polymers and 
non-naturally occurring ainino acid polymers. As used herein, Ihe terms encompass amino 
acid chains of any length, including fiill-length proteins, wherein the amino acid residues are 
linked by covalent peptide bonds. 

[0013] A "family of proteins" as used herein refers to two or more proteins that carry 

out similar or related biochemical fimctions. The members of a family of proteins 
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demonstrate a substantial level of amino acid sequence homology in at least one conserved 
domain which typically relates to the functional characteristics of the family. A "family of 
genes" consists of the genes that encode a family of proteins. 

[0014] A "signature motif' as used herein refers to an amino add sequence 
characteristic for the members of a fimuly of proteins and is typically found within a highly 
conserved domain critical for the biological functions of the family of proteins. The length of 
a signature motif is preferably 5-10, and more preferably 6-8, amino acids. Among the 
ammo acids of a signature motif, typically about 50%, preferably about 60% or more, are 
constant within all m^bers of the family and the balance are variable. For certain famiUes 
of proteins, the practice of the present invention may involve the identification of two or 
more variants of a signature motif, each variant representing the amino acid sequences of a 
sub-set of the proteins comprising the family. 

[0015] The term "consensus sequence" as used herein defines the set of nucleic acid 
sequences that encodes the amino acid sequences of at least all members of a family of 
proteins sharing the same signature motif. Typically, there are multiple nucleotide sequences 
that encode the amino acid sequ^ces of a signature motif, due to both the variability in 
amino acid sequence within the signature motif itself and codon degeneracy. A consensus 
sequence is represented by a formula comprising both constant and variable bases. Among 
the variable bases, some may be "fully random" (or "random"), z.e., they may be any of the 
four possible bases. Others may be "partially random", Le., they may comprise only two or 
only three predetermined bases of the four possible bases. The length of a consensus 
sequence may vary depending on the length of the signature motif. Typically, the length is 
between 15-30 nucleotides; more firequentiy, between 18-24 nucleotides. 

[00161 Amino acids may be referred to herem by either tiie commonly known tbree- 

letter symbols or by the one-letter symbols recommended by the lUPAC-IUB Biochemical 
Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly 
accepted single-letter codes. 

[0017] The term "gene expression" as used herem refers to all processes involved in 
producing a biologically active agent, which may be a nucleic acid (6.g., an mRNA) or 
protein (e.g., an enzyme) in nature, firom a nucleic acid encoding the biologically active 
agent Gene expression includes all post-transcriptional (e.g., RNA splicing) and/or post- 
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translational processing (e.g., post-translational modificatioa such as glycosylation) required 
to produce the mature agent Gene egression may he "silenced," "inhibited" or 
"suppressed" by any means that interrupts the process leadii^ to the production of the 
biologically active agent, including intenruptions at transaiptional, post-transoriptional, 
translational, and post-translational levels. For the purpose of flie present invention, "post- 
transcriptional gene sflendng" refers to the effect of the siRNA produced in accordance 
wifli the invention in siq?pressing the e3q)ression of genes encoding protans belonging to a 
family of protdns of interest. 

[0018] The tam "sense siRNA strand" refers to the siRNA strand that matches the 
target mRNA sequence. The term "antlsense siRNA strand" refers to the siRNA sliand that 
is conq>lanentary to the target mRNA sequence. 

DETAILED DESCRIPTION OF THE INVENTION 
I. Introduction 

[0019] The present invention provides a novel mediod for desigmng and expressing a 
Ubrary of siRNAs wherein the library is optimized to uiclude at least aU siRNAs sufficient to 
fimctionally silence the genes which encode all members of a predetermmed family of 
proteins. The invention provides for the molecular cloning of the entire Ubrary of siRNAs of 
interest in a single step, and eliminates the high cost involved in tiie synthesis of individual 
SiRNAs. The method also affords a high degree of flexibiUty in tiie design and expression of 
an siRNA library, allowing the researcher to easily modify the complexity of the Ubrary (i.e. 
increase or decrease its size), depending wpon the goals of the research and the information 
that is available wifli respect to the genes or protein femily of interest. The invention has 
particular appUcation in genomics research, and may be effectively used in connection with 
tiie identification and vaUdation of genes coding for proteins which are known or suspected 
to be involved m disease processes, mcluding G protein coiq)led receptors, ion chaimels, 
receptor tyrosine kinases, non-receptor tyrosine kinases, nuclear hormone recq)tors, 
GTPases, ATPases, serine/fereonine kinases, proteases, matrix metalloprotanases (MMPs), 
GTPase-activating proteins (GAPs), and E3 ubiquitin Ugases. Alfliough fiom a theoretical 
stanc^omt a Ubrary of the present invention need not be Umited ui size, practical 
conaderations dictate designing a Ubrary with more Umited complexity. Typically, a Ubrary 
designed and constructed in accordance wifli flie invoition wiU comprise between 20,000 and 
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100,000 members, although libraries having as few as 50 members or as many as one million 
members are also included within the scope of the mventian. 

n. Identification of a Signature Motif 

[0020] The construction of an siRNA expression Ubrary in accordance with flie present 
invention requires as a first stq) identifying at least one "signature motif" for Ihe femily of 
proteins of intarest. Each signature motif is an amino add sequaice characteristic for the 
members of flie fiunily of protems and is usually found withm a highly conserved domain 
critical for the biological functions of tiie members of the fiunily. The highly conserved 
domain and signature motif may be identified by various means known in the art includmg 
alignment of amino add and nucleotide sequences and analysis of sequence homology within 
the family. A. D. Baxevanis et aL, BioinfomcOics- A Practical Guide to the Analysis of 
Genes and Proteins. 2"* ed. (1998). Various tools are available to assist in the identification 
of a signature motif, including software such as CLUSTALW (ffiggens et al 1996), which 
may be used with various defeult parameters, or modified as needed. A signature motif is 
typically 5-10 and more preferably 6-8 amino adds m lengfli. Among the amino acids 
con^rising a signature sequence, prefwably about 50%, more preferably 60% or more, are 
constant within flie members of the femily of proteins and the balance are variable. 

[00211 A representative signature motif for the family of nuclear hormone receptors is 
shown in Example 1 . This is a signature motif located within tiie Zinc Fin^_C4 domain of 
Ihe 45 known members of this femily of protdns and comprises the amino acid sequence: 
(T/S/A>C-(D/E/G/N>(C3/S/AHC)-(K/SHA/G/SA0, where the second and fifth amino adds 
of the sequence, C (cysteine), are constant within all membos of the femily, and the balance 
are variable. It wiU be appreciated that the degree of variability of the remaining amino adds 
is not equal throu^out this signature motif. Thus, the first and fourth positions maybe filled 
by any of three amino acids, the third and seventh positions may be filled by any of four 
amino adds, and the sixth position may be filled by ather of two amino adds. 

[0022] For certain femilies of protdns, e.g., tiiose with a very large number of 
mraibCTS, or those for whom it may not be possible to idoitify a smgle signature motif across 
all members or for whom designing an siRNA e3q)ression library based a single 
signature motif would result in a library that would be fimctionally too complex, the practice 
of the present mvention may mvolve the idcaitification of two ot more variants of a signature 
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motif, with each variant representing the amino acid sequences characteristic of only a sub- 
set of the proteins comprising the family of proteins. A representative example is flie femily 
of tyrosine kinases which curreaatly has 89 known membos. As shown in Example 2, at least 
seven variants of a signature motif for Ihis fimnily may be identified, eadi variant representing 
a sub-set of flie femily as a whole, the sub-sets conqirising as few as two members and as 
vasay as 61 monbers. 

m. Determining a Consensus Sequence 

[00231 Once a signature motif has been idmtified, as desaibed above, the signature 
motif is then "reverse translated" into a "consensus sequence" representing tiie set of nucleic 
acid sequences tiiat encodes the amino acid sequences of at least all the known proteins 
sharing the signature motif. The "reverse translation" process may be performed by deducmg 
aU possible codons for each amino acid in the signature motif ftom ttie genetic code or by 
extracting the specific coding sequence corresponding to flie signatiire motif for each member 
of die family ftom an appropriate sequaice database (e.g., Genbank), The length of a 
consensus sequence may vary depending on the lenglh of the signature moti£ TypicaUy, the 
laigfli is between 15-30 nucleotides; more preferably, between 18-24 nucleotides. 

[0024] A consraisus sequence may be represaited by a formula, comprisuig botii 
iBxed and variAle bases. Thus, die consensus sequence for die signature motif for die femUy 
of nuclear honnone receptors mentioned above and shown in Example 1 is: 

I (A/T/G) (C/T) (A/G/T/C) ] [TG (T/C) ] [ (A/6) (A/G) (A/C/G/T) ] 
[ (A/G) (C/G) (A/C/G/T) ] [TG (T/C) ] [ (A/T) (A/C/G) (A/C/G) ] 
[ (A/G) (C/G/T) (A/C/G/T) 1 

As can be seen, among flie variable bases, some maybe fully random , i.e., they maybe any 
of die four possible bases, A, C, G or T. Ofliers maybe partially random, Le., tiiey may 
comprise only two or only three predetennined bases of die four possible bases. Generally, in 
detamining a consensus sequence, all possible codon variations for a gjvai ammo acid will 
be taken into account; however, for various reasons, including die need to limit die 
complexity (i.e. size) of die siRNA Ubrary, die consaisus sequence may be restricted to 
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include only the specific codons known to code for the amino acids comprising tiie known 
members of the protein family. 

[0025] Once a consensus sequence has been detennined for a family of proteins, as 
described above, DNA oligonucleotides may be chemically synthesized in a single batch for 
all nucleic acid sequences defined by the consensus sequence, and these may be utilized as 
siKNA coding sequraces for incoiporation mto expression cassettes citable of expressing an 
siRNA library in accordance with Hke invention. It will be q>preciated that the siSNA library 
expressed in this manner will be cs^able of silencing the genes encoding at least all known 
proteins within the predetermined family of proteins, although the library will also be capable 
of silencing additional genes which have not yet been identified or that do not exist in nature. 
Thus, in the above example, the signature motif was determined based upon the amino acid 
sequences of 45 known members of the family of nuclear hormone recq)tors. However, the 
siSNA library that may be cq)ressed based upon the consensus sequence corresponding to 
this signature motif comprises a significantly larger number of members, due to the partial 
randomness of the nucleotide coding sequence. In the above example, since there are nine 
positions that may be filled by any of two bases, four positions that may be filled by any of 
three bases and four positions that may be filled by any of four bases, the total number of 
permutations represented by the consensus sequence is2^ x 3"^ x 4"^, or 10,616,832. Thus, the 
siRNA library that will be expressed will have a complexity of 10,616,832 members, and will 
be capable of silencing not only ttie genes encoding the known members of the fainily of 
nuclear hormone receptors but also the genes encoding as yet unknown members of the 

* 

family, as well as many other genes matching the consraisus sequence, including genes that 
code for proteins in the odier two reading frames and genes that are complementary to the 
consensus sequence. 

IV. Expression Cassettes 

[0026] Expression cassettes for expressing siRNA libraries in accordance with the 
invention may be constructed by any method known in the art, in particular, methods that 
allow for transcription of both strands of the double-stranded siRNA even when the coding 
sequence comprises partially randomized nucleotides, as is the case with the sequences 
defined by a consensus sequence in accordance with the present invention. 
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[0027] A particularly preferred method involves ttie use of a dual promoter system 
that allows for ligating the nucleic acid sequence encoding the siRNA between two suitable 
promoters oriented in opposite orientation. "Opposite orientation" refos to a positioning of 
tiie two piomotors (see Figure 1) such that one promoter will be operably linked to the 
"sense" strand of the nucleic acid and the other promoter operably linked to the "anti-sense" 
strand. When properly positioned, the promoters preferably initiate transcription at the first 
base encoding the siRNA of interest Transcription terminates at a specific termination 
sequence which, when using the preferred pol III type HI promoters described below, 
comprise at least four thymidyl residues located at the end of the siRNA coding sequence, 
preferably located in the 3 ' end of the opposite promoter. In addition to a termination 
sequence, the expression cassette constract can optionally contain a restriction site to ease 
recovery of flie sequence encoding flie siRNA. This restriction site is preferably located 5' to 
tiie four thymidyl residues and 3' to the TATA box and created by substitution of existing 
bases of the promoter sequence, preferably using site-directed mutagmesis techniques as is 
known in the art Anywhere firom 0 to 20 bases can be modified in the region 5' to the four 
thymidyl residues and 3' to tiie TATA box, to create restriction sequences, operator 
sequences or other genetic or cloning elements. The nucleic acid encoding the antisense 
siRNA strand is syntiiesized, preferably enzymatically, after flie nucleic acid encoding the 
sense siRNA strand is ligated between tiie oppositely orientated promoters. Altematively, tiie 
nucleic acid encoding tiie antisense siRNA strand can be ligated between the oppositely 
oriented promoters and the nucleic acid encodmg the sense siRNA strand can be 
subsequentiy syntiiesized enzymatically. Enzymatic metiiods for DNA oligonucleotide 
synfliesis frequentiy OT^)loy Klenow, T7, T4, Taq or E. coU DNA polymerase as described 
in Sambrook and Russel, Molecular Cloning: A Laboratory Manual 3 ed. (2001). 
Methods for construction of dual promoter siRNA expression cassettes are described in U.S. 
Patent Application serial number 10/626,512, the teachings of which are incorporated herem 
byrefCTence. 

[0028] Altematively, tiie expression cassettes may be constructed such that they express 
hairpin siRNAs (shRNAs) fi:om a smgle promoter Paddison, P.J. et al Genes and 
Development, 16: 948-958 (2002); Brummelkamp, T.R. et al Science, 296: 550-553 (2002)]. 
Metiiods for tiie construction of tiie hairpin siRNA expression cassettes fcom a partially 
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randomized oligonucleotide are described in U.S. Patent Application serial number 
10/628,587, the teachings of which are incorporated herein by reference. 

[0029] In another embodiment, the siKNA expression cassettes are constructed using the 
polymerase chain reaction (PGR). Those skilled m the art will recognize that fimctional pol 
in promoters can be operably linked to each end of an siRNA coding tegion by PGR [e.g., 
see Methods in Molecular Biology, Vol 15: PCR Protocols: Current Methods and 
Applications. White, B.A., ed. HumanaPress, Inc., Totowa,NJ (1993)]. This approach 
requires the addition of oligonucleotide extensions to each end of the semi-raadomized 
oligonucleotide to serve as priming sites. The sequence of the oligonucleotide extensions is 
dependent on the choice of pol in promoters. 

[0030] The particular promoters chosen for use in the expression cassettes of the 
present invention will depend upon which organism or cell type is to be targeted by flie 
siRNA encoded in the expression cassette. For example, if plant cells are to be the target, 
ttien plant promoters should be used. The promoters can be constitutive, inducible, or cell 
dependent, depending on tiie application and result desired. The promoters do not have to be 
the same, although they can be. They can be of diffaent types, isolated from different genes, 
be differentially regulated or differ by as little as one base. 

[0031] Preferably the promoters will not require any intragenic promoter elements, so 

as allow for the greatest degree of flexibility when designing the coding region of the 
cassette. The promoters will also preferably not have a requkement for a particular nucleotide 
at die transcription start-point, although some specificity is tolerable, uicluding a specific 
requirement for a G or A at Ihe first position by some polymerases. Particularly prefenred 
promoters meeting the above criteria are RNA polymerase III (pol m) promoters of type m, 
such as the human U6 small nuclear RNA gene promoter and the promoter for human HI 
RNA. Such promoters can produce transcripts constitutively without cell type specific 
expression, althougjh operator sequences can be engineered rendering the promoter inducible. 
The use of U6 gene transcription signals to produce short KNA molecules in vivo is described 
by Miyagishi and Taira, Nature Biotechnology, 20:497-500 (2002); Lee, Nan Sook, et al.. 
Nature Biotechnology, 20:500-505 (2002); Nooiiberg et aL, Nucleic Acids Res,, 22:2830- 
2836 (1995), and the use of HI RNA promoters is described by Baer et al. Nucleic Acids 
Res., 18:97-103 (1990) andHannon et al.,J. Biol. Chem., 266:22796-22799 (1991). The 
preferred promoters mentioned above, such as the U6 promoter and the human HI promoter 
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contain all of the cis-Mting promoter elements upstream of the transcription start site. These 
upstream sequence elements include a TATA box (Mattaj et aL, Cell, 55:435-442 (1988)), a 
proximal sequence element (PSE), and in some circumstances a distal sequence element 
(DSE, Gupta and Reddy, Nucleic Adds Res., 19:2073-2075 (1991)), as shown in Figure 1. 
Alternatively, tRNA promoters [Kawasaki and Taira, Nucl. Acids Res, 31: 700-707 (2003)] 
and pol n promoters [Xia, H. et al, Nat. BiotechnoL, 20: 1006-1010 (2002)] may be used. 

V. General Recombinant Methods for Constructing siRNA Libraries 

[0032] The construction of expression cassettes suitable for practicing the present 
invention utilizes methods known to those skilled in tiie art of molecular biology. In general, 
the expression cassettes may be ligated into a DNA transfer vector, such as a plasmid, 
bacteriophage DNA, or lentiviral, adenoviral, alphaviral, or other viral vector. Prokaryotic or 
eukaryotic host cells may then be transfected or transduced with an appropriate transfer 
vector containing genetic material conesponding to an expression cassette in accordance with 
the present invention, such that the siRNA is transcribed in the host cells. The siRNA 
expression cassettes can also be delivered directly to the host cells by transfection witiiout 
prior ligation into a DNA transfer vector [e.g., see Castanotto, D. et al., RNA 8: 1454-1460 
(2002)]. 

[0033] In preparing flie expression cassettes, the DNA sequences may be inserted or 

substituted into a bacterial plasmid. Any convenient plasmid may be employed, which will 
be characterized by having a bacterial rq[>Hcation system, a marker that allows for selection in 
the bacterium, and generally one or more unique, convraiently located restriction sites. 
These plasmids, referred to as vectors, may include such vectors as pACYC184, pACYC177, 
pBR322, pUC9, and their derivatives. A particular plasmid is often chosen based on the 
nature of the markers, the availability of convenient restriction sites, copy number, and the 
like. Subsequently, tiie DNA sequence encoding an siRNA, may be inserted into the vector 
at an appropriate restriction site, and the resulting plasmid is used to transform the E. coli 
host After the transformed E. coli is cultured in an qipropriate nutrient medium, the bacteria 
are harvested and lysed, and the plasmid recovered. 

[0034] Basic texts disclosmg the general methods for use in connection with this 

invrotion include Sambrook and Russell, Molecular Cloning: A Laboratory Manual 3d ed. 
(2001); Gelvin et al., eds. Plant Molecular Biology Manual (1990); Kriegler, Gene Transfer 
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and Expression: A Laboratory Manual (1990); and Ausubel et al. Current Protocols in 
Molecular Biology (1994). 

[0035] Chemical synthesis of linear oligonucleotides is well known in the art and can 
be made by any of several different synthetic procedures including the phosphoramidite, 
phosphite triestar, H-phosphonate and phosphotriester methods, typically by automated 
syntiiesis methods. Beaucage and Caruthers, Tetrahedron Letts.^ 22:1859-1862 (1981); 
Needham-VanDevantere^a/.,Mic/eic>4d&j{e5., 12:6159-6168(1984). Moreover, 
oligonucleotides can also be custom-made and ordered from a variety of commercial sources 
known to persons of skill in the art. It will be appreciated that in preparing the 
oligonucleotides in accordance with the invention, appropriate instructions are provided to 
the synthesizer with respect to the randomization of the nucleotides within the consensus 
sequence that are not fixed, such tfiat each ''wobble" position is randomly filled with one of 
the two or one of the three or one of the four nucleotides allowed for that position as 
stipulated by the consensus sequence. 

[0036] The sequence of the isolated and synthetic oligonucleotides utilized in fiie 
practice of the present invention can be verified afl^ cloning using, the chain 
termination method for sequencing double>stranded templates of Wallace et ah. Gene, 16:21- 
26(1981). 

VI. Reducing Library Complexity 

[0037] As already indicated, &e present invention provides a significant amount of 

flexibility with respect to the complexity (number of members) of the siRNA libraries 

produced in accordance with the invention. This flexibility is a result of the ability to modify 

a number of parameters involved in the design and construction of such Ubraries. Included 

among these parameters are the length of the signature motif and the number of axmno acid 

positions within the signature motif that are constant for all members. Thus, a shorter 

signature motif {fi.g, six amino acids rather tiian seven) or one that has a larger number of 

ammo acids that are constant (e.g. five rather than three or four) will generally ''reverse 

translate" into a consensus sequence having a larger percentage of bases that are constant, 

and as a consequence, a library generated on the basis of such a consensus sequence will have 

fewer members. Similarly, the conq>lexity of a library may also be reduced by truncating the 

consensus sequence (e.g., by eliminating one or more nucleotide positions at either the 3' 

end or S' end of the sequence, as illustrated in Example 1 below), or, as abready indicated, by 
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limiting the randomness of the nucleotides comprising a consensus sequence, by utilizing 
only those codons that encode for amino acid sequences of known members of the &mily of 
protems of int^st, rather than all possible codons based upon the degmeracy of the gen^c 
code. 

[0038] An additional and effective way to reduce the complexity of a library is to 
divide the members of a protein family of interest into two or more sub-sets, each sub-set 
comprising members having a variant of the signature sequence, each such variant 
comprising a relatively high number of amino acids that are constant for all m^nbers of the 
sub-set. The effect of such division can be sem clearly with reference to Example 2 and 
Table 1 below, which shows the division into seven sub-sets of the 89 known members of the 
family of tyrosine kinases. Each of sub-sets 1 and 4-7 have a different variant of the 
signature motif, but all five comprise sev^ amino adds that are constant for aU members of 
the respective sub-set Sub-set 3 has a variant signature sequence in which only one of the 
seven amino acids is not constant for all members of the sub*set; and only sub-set 2 has a 
variant signature motif in which three of the amino acids are not constant for all members. 



Table 1 



Variant 


Signature Motif 


No. of Known 
Members 


Complexity 


1 


H R D L K S S 


3 


4 


2 


H R N/D UVfl A A/V K 


3 


2,304 


3 


H R D L R A/S A 


8 


10,368 


4 


H R/K D L AT R 


9 


2,592 


5 


H R D L A A R 


61 


8.192 


6 


H K D L A A R 


3 


576 


7 


H R D I A A R 


2 


32 


Total 


89 


24,068 



[0039] As a consequence of this division of the family into seven sub-sets, and as a 

further consequence of the fact that only known codons are taken into account when 

translatmg each of the variants of the signature motif into a consensus sequence, the total 

complexity of the library is significantly reduced. In the case of the family of tyiosme 

kmases, w^e an siKNA library to be produced without this division, the complexity of the 

library would be on the order of tens of miUions of membm. As can be seen firom Table 1, 

when such a division into the seven sub-sets Usted in the table is done, the effect is to enable 

the production of a library having oidy 24,068 members. It wiU be qipreciated that such a 

library is formed by combining all the DNA oUgonucleotides synthesized on the basis of each 
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of the seven consensus sequences and ligating these to the ejq)ression cassettes; in a preferred 
embodiment, in order to obtain a uniform complexity of 24,068 members, the seven batches 
of oligonucleotides are mixed together in direct proportion to their complexity prior to 
incozporation m ttie cassettes. 

Utilizing any of the techniques described herein and in the Bxanqsles, it is possible to 
design efficient siRNA libraries comprismg as little as 50 unique members or as many as one 
million or more members, althougji typically most libraries will be within the range of 20,000 
to 100,000 unique members. 

Vn. Recombinant Vectors 

[0040J The siRNA expression cassettes m accordance witii the present invention may 
be incorporated in a vector that is capable of self-rq>lication in host cells. As one of oidinary 
skill in the art would recognize, a large variety of such vectors may be suitable for use in 
connection with the present invention. Certain types of vectors allow the expression cassettes 
to be amplified. Other types of vectors are necessary for efficient introduction of the 
expression cassettes to cells and their stable expression once introduced. Any vector capable 
of accepting a DNA ^ression cassette of the present invention is contemplated as a suitable 
recombinant vector for the purposes of die inventioa The vector may be any circular or 
linear length of DNA that either integrates into the host genome or is maintained in episomal 
form. Vectors may require additional manipulation or particular conditions to be efficiently 
inctroduced into a host cell (cg.^ many expression plasmids), or can be part of a self- 
integrating, ceil specific system, such as a recombinant virus. 

[0041] Infection of jcells with a viral vector is a preferred method for introducing the 

« 

siRNA expression libraries of the present invention into cells. Exemplary mammalian viral 
vector systems include adenoviral vectors, adeno-associated type 1 ("AAV-1") or adeno- 
associated type 2 ("AAV-2") viral vectors, hepatitis delta vectors, live, attenuated delta 
viruses, herpes viral vectors, alphaviral vectors, or retroviral vectors (including lentiviral 
vectors). 

^ 

[0042] The siElNA expression libraries in accordance with the invention may also be 
introduced into a host cell by transfection and other physical methods as is known in the art 
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Vm. Uses for the Xnventioii 

[0043] One of the main applications of the present mvention is the use of a library of 
siKNAs targeting a predetermined gene family for purposes of identi^g genes involved in 
disease processes, utilizing techniques such as Inverse Genomics*. In general terms, these 
techniques involve transfecting or transducing a population of cells with the siKNA 
expression library and monitoring the population of cells for any phenotypic change, such as 
decrease or increase in expression of mRNA, proliferation, differentiation, apoptosis, or 
senescence, etc. For example, an siKNA library targeting the tyrosme kinase family can be 
used to identify tyrosine kinases that function in the normal apoptotic pathway as follows. 
The library is delivered to a population of cells by transduction with a relrovhal vector. The 
ti^sduced cells are then subjected to a stimulus that induces apoptosis in normal cells (eg., 
treatment with etoposide, cisplatin, or ionizing radiation). The majority of the treated cells 
will die due to this treatment. However, if a tyrosine kinase participates in flie apototic 
pa&way downstream of the stimulus, then cells expressing an siKNA against this tyrosine 
Idnase will survive due to the siRNA-mediated defect in the aptotic pathway. SiKNA 
expression cassettes are rescued fcom the surviving cells by PGR or other methods known to 
those skilled in the art. Putative tyrosine kinases tiiat function in the apoptotic pathway are 
then identified from the siRNA sequences. 

[0044] The level of gene expression may also be detCTnined at the protein level. 

Various immunological assays are routinely used by those skilled in the art to measure flie 

level of a gene product, particularly using polyclonal or monoclonal antibodies that react 

specifically with a protein product. In addition, functional assays may also be performed to 

confirm the suppressed expression of one or more genes in transfected/transduced cells. 

Depending on the particular gene family and the known biological fimctions the gene 

products normally exert, specific assays can be designed for detecting decreased level of 

activity. For example, when the targeted gene fionily encodes enzymes, specific enzymatic 

assays can be carried out using suitable substrates to detect the enzymatic activity in the 

transfected or transduced cells. When the targeted genes encode kinases, for instance, the 

lack of kinase activity in transfected/transduced cells may be reflected in reduced level of 

phosphorylation of the substrates; when the targeted genes encode receptors, such as cytokine 

receptors, the diminished gene expression may be reflected in reduced response to the 

16 



wo 2004/072261 PCT/US2004/003949 

ligands; when the targeted genes encode tumor suppressors or oncogenes, the decreased gene 
expression may be reflected in changes, e.g., in the tumorig^c tendency and/or metastatic 
potential of the transfected or transduced cells. Other possible changes in phenotypes that 
may indicate the reduced gene expression include: viral susceptibility - HIV infection; 
autoimmunity - inactivation of lymphocytes; drug sensitivity - drug toxicity and e£Bcacy; 
graft rejection- MHC antigen presentation, etc. 

[0045] All publications and patent applications cited in this specification are herein 

incorporated by reference as if each individual publication or patent application were 
specifically and individually indicated to be iocoiporated by reference. 

[0046] Although the foregoing invention has be^ described in some detail by way of 

illiistration and example for clarity and understanding, it will be readily apparent to one of 
ordinary skill in the art in light of the teachings of this invention that certain changes and 
modifications may be made thereto without departing firom the spirit and scope of the 
appended claims. 

[0047] As can be appreciated firom the disclosure provided above, the present 

invention has a wide variety of appUcations. Accordingly, the following examples are 
offered for illustration purposes and are not uxtended to be construed as a limitation on the 
invention in any way. Those of skill in the art will readily recognize a variety of nonessential 
parameters that could be changed or modified to yield essentially similar results. 
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EXAMPUBS 



[0048] 

A 
C 
D 
E 
F 
G 
H 
I 

K 
L 



The symbols for amino acids used in fhe samples are as follows : 



Alanine 

Cysteine 

Aspartic acid 

Glutamic acid 

Phenylalanine 

Glycine 

Histidine 

Isoleucine 

Lysine 

Leucine 



M Methionine 

N Asparagjne 

P Proline 

Q Glutamine 

R Arginine 

S Serine 

T Threonine 

V Valine 

W Tryptophan 

Y Tyrosine 



Example 1 

Family of human nuclear hormone receptors (ZdF_C4 domatn) - 45 members 

In this example, a single signature motif was designed based on the zinc finger 
domain present in all 45 known members of the nuclear hormone receptor family. A short 
segment of the zinc finger domain present in each of the 45 known £amily members is shown 
below. The consensus sequence was ^^reverse translated" utilizing only those codons diat 
encode &e signature motif region of known members of the family. Using a full 21- 
nucleotide consensus sequence to construct the siRNA library, the complexity would be 
10,616,832. By reducing the Iragtti of the consensus sequence to 19 nucleotides, the 
complexity is reduced to 884,736. SiKNAs as short as 19 nucleotides are highly eflScient at 
reducmg their cognate mRNA levels [Czaudema, F. et al, NucL Acids Res. 31: 2705-2716 
(2003)], therefore, reducmg the length of Ihe consensus sequence will have little, if any, 
effect on the degree of silencing produced by members of the library. 

tataatgcactgacctgtgaggggtgtaaaggtttcttcaggaga (SEQ ID N0:1) 
y N A L T C E G C K G F P R R (SEQIDN0:2) 

tacggcgtgcgcacctgtgagggctgcaaaggcttctttaagcgc (SEQIDN0:3) 
YGVRTCEGCKGFFKR (SEQIDN0:4) 
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tacggcgtgcgcacctgtgagggctgcaaaggcttctttaagcgc (SEQ ID N0:5) 

Y G V R T C E G C K G F F K R (SEQIDNOiQ 

tacggcgtgcgaacctgcgagggctgcaagggctttttcaagaga (S£Q ID N0:7) 

Y G V R T C E G C K G F F K R (SBQ.IDN0:8) 

tatggtgtccgcacatgtgagggctgcaagggcttcttcaagcgc (SEQ ID N0:9) 

Y G V R T C E G C K G F F K R (SBQIDNO:10) 

t atggagcagtaact tgtgaaggctgcaaaggattttttaaaaga (SEQ ID N0:1 1) 

Y G A V T C E G C K G F F K R (SEQIDN0:12) 

tacggggttatcacctgtgaggggtgcaagggcttcttccgccgg (SEQ ID N0:13) 

Y 6 V I T C E G C K G F F R R (SEQ ID NO: 14) 

tacggagtcatcacatgtgaaggctgcaagggattctttaggagg (SEQ ID NO: 15) 

Y G V I T C E G C K G F F R R (SEQIDN0:16) 

tatggtgtcattacatgtgaaggctgcaagggctttttcaggaga (SEQ ID NO:17) 

Y G V I T C E G C K G F F R R (SEQIDNO:18) 

tatggagtgtacagctgcgaggggtgcaagggcttcttcaagcgg (SEQ ID NO:19) 

Y G V Y S C E G C K G F F K R (SEQIDNO:20) 

tacggggtttacagctgtgagggttgcaagggcttcttcaaacgc (SEQ ID N0:21) 

Y G V Y S C B G C K G F F K R (SEQn>NO:22) 

tacggggtatacagttgtgaaggctgcaaagggttcttcaagagg (SEQ ID NO;23) 

Y G V Y S C E G C K G F F K R (SEQIDNO:24) 

tacaacgtgctcagctgcgaaggctgcaagggcttcttccggcgc (SEQ ID NO:25) 

Y N V L S C E G C K G F F R R (SEQIDNO:26) 

tacaatgttctgagctgcgagggctgcaagggattcttccgccgc (SEQ ID NO:27) 

Y N V L S C E G C K G P F R R (SEQIDNO:28) 

tatgggatcatctcctgtgagggctgcaaagggtttttcaagcgg (SEQ ID NO:29) 

Y G I I S C E G C K G F F K R (SEQIDNO:30) 

tatggggtcagctcttgtgaaggctgcaagggcttctttcgccga (SEQ ID N0:31) 

Y G V S S C B G C K G F F R R (SEQIDNO:32) 

tatggggtcagctcttgtgaaggctgcaagggcttctttcgccga (SEQ ID NO:33) 

Y G V S S C E G C K G F F R R (SEQIDNO:34) 

tatggggctgtcagttgtgaaggttgcaaaggtttcttcaaaagg (SEQ ID NO:35) 

Y G A V S C E G C K G F F K R (SEQIDN0:3^ 

tacggtgtcttcacctgcgagggctgcaagagctttttcaagcga (SEQ ID NO:37) 

Y G V F T C E G C K S F F K R (SEQIDNO:38) 
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tacggccagttcacgtgcgagggctgcaagagcttcttcaagcgc (SEQ ID NO:39) 

Y G Q F T C E G C K S P F K R (SEQIDNO:40) 

t acggggtctacgcctgcgacggctgctcaggttttttcaaacgg (SEQ ID N0:41) 

Y G V Y A C D G C S G F F K R . (SEQIDNO:42) 

tatggcatctatgcctgcaacggctgcagcggcttcttcaagagg (SEQ IDNO:43) 

Y G I Y A C N G C S G F F K R (SEQIDNO:44) 

tatggggcatccacctgtgatgggtgcaagggtttcttcagacgc (SEQ ID NO:45) 

Y G A S T C D G C K G P F R R (SEQIDN0:4Q 

tacggtgcctcgagctgtgacggctgcaagggcttcttccggagg (SEQ ID NO:47) 

Y G A S S C D G C K G F F R R (SEQIDNO:48) 

tatggggtcagcgcctgtgagggctgcaagggcttcttccgccgc (SEQ ID NO:49) 

Y G V S A C E G C K G F F R R (SEQIDNO:50) 

tatggggtcagcgcctgtgagggatgtaagggctttttccgcaga '(SEQ ID N0:51) 

Y G V S A C E G C K G P F R R (SEQIDNO:52) 

tacggtgtgcacgcctgcgagggctgcaagggctttttccgtcgg (SEQ ID NO:53) 

Y G V H A C B G G K G F F R R (SEQIDNO:54) 

tatggagttcatgcttgcgaaggctgtaagggtttctttcggaga (SEQ ID NO:55) 

Y G V H A C E G C K G F F R R (SEQIDNO:56) 

tacggtgttcatgcatgtgaggggtgcaagggcttcttccgtcgt (SEQ ID NO:57) 

Y G V H A C E G C K G F F R R (SEQIDNO:58) 

tacggagtccacgcgtgtgaaggctgcaagggcttctttcggcga (SEQ ID NO:59) 

Y G V H A C E G C K G F F R R (SEQE) NO:60) 

tatggagttcatgcttgtgaaggatgcaagggtttcttccggaga (SEQ ID N0:61) 

Y G V H A C E G C K G F F R R (SEQIDNO:62) 

ttcaatgtcatgacatgtgaaggatgcaagggctttttcaggagg (SEQ ED NO;63) 
F N V M T C E G C K G P F R R (SEQIDNO:64) 

tttaatgcgctgacttgtgagggctgcaagggtttcttcaggaga (SEQ ID NO:65) 
p N A L T C B G C K G P F R R (SEQIDNO:66) 

taccgctgtatcacgtgtgaaggctgcaagggtttctttagaaga (SEQ ID NO:67) 

Y R C I T C E G C K G P P R R (SEQIDNO:68) 

taccgctgtatcacttgtgagggctgcaagggcttctttcgccgc (SEQ ID NO:69) 

Y R C I T C E G C K G F F R R (SEQIDNO:70) 
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tacggactgctcacgtgtgagagctgcaagggcttcttcaagcgc (SEQ ID NO:71) 

Y G L L T C E S C K G F F K R (SEQIDNO;72) 

tatgggctcctcacctgtgaaagctgcaagggattttttaagcga (SEQIDNO:73) 

y G L L T C E S C K G F F K R (SEQIDNO:74) 

tatggggtagtcacctgtggcagctgcaaagttttcttcaaaaga (SEQ ID NO:75) 

Y G V V T C G S C K V F F K R (SEQIDN0:7Q 

tatggagctctcacatgtggaagctgcaaggtcttcttcaaaaga (SEQ ID NO:77) 

Y G A L T C G S C K V F F K R (SEQIDNO:78) 

tatggtgtccttacctgtgggagctgtaaggtcttctttaagagg (SEQ ID NO:79) 

Y G V L T C G S C K V F F K R (SEQIDNO:80) 

tatggagtcttaacttgtggaagctgtaaagttttcttcaaaaga (SEQ ID N0:81) 

Y G V L T C G S C K V F F K R (SEQIDNO;82) 

tacggcgtggcctcctgcgaggcttgcaaggccttcttcaagagg (SEQ ID NO:83) 

Y G V A S C E A C K A P F K R (SEQIDNO:84) 

tatggtgtggcatcctgtgaggcctgcaaagccttcttcaagagg (SEQ ID NO:85) 

Y G V A S C E A C K A P F K R (SEQIDN0:8^ 

tatggagtctggtcctgtgagggctgcaaggccttcttcaagaga (SEQ ID NO:87) 

Y G V W S C B G C K A F F K R (SEQIDNO:88) 

tatggagtctggtcgtgtgaaggatgtaaggccttttttaaaaga (SEQ ID NO:89) 

Y G V W S C E G C K A F F K R (SEQIDNO:90) 
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Signature Motif: 

(T/S/A)-C-(D/E/G/N>(G/S/AHC).(K/S)-(A/G/SAO 

Consensus sequence (21 nt): 

( A/T/G) (C/T) (A/G/T/C ) TG(T/C ) (A /G) (A/G) (A/C/G/T ) 
T/S/A C D/E/G/N 

( A/G) (C/G) (A/C/G/T ) TG(T/C ) ( A/T) (A/C/G) (A/C/G ) 
G/S/A C K/S 

t 

( A/G) (C/G/T) (A/C/G/T ) 
G/S/V/A 

Complexity: 2' x 3* x 4* = 512 x 81 x 256 = 10,616.832 members 
Consensus sequence (19 nt): 

( A/T/G) (C/T) (A/G/T/C ) TG(T/C ) ( A/G) (A/G) (A/C/G/T ) 
T/S/A C D/B/G/N 

( A/G) (C/G) (A/C/G/T ) TG(T/C ) (A /T) (A/C/G) (A/C/G ) 
G/S/A C K/S 

( A/G) - - 
G/S/V/A 

Complexity: 2' x 3' x 4' = 512 x 27 x 64 = 884,736 members 

Example 2 

Family of tyrosine kinases - 89 members 

This exanq)le shows tiie identification of seven variants of a portion of the catalytic 
domain of the femily of tyrosine kinases. As shown in Table 1 above, these may then be used 
for the production of Ubrary of siRNAs targeting this domain having a reduced complexity of 
24,068 unique members. 
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Variant 1: 3 members 

gttcccatcatccaccgcgaccttaagtccagcaacatattgatcctc (SEQ ID N0:91) 

V P I I H R D L K S S N I L I L (SEQIDNO:92) 

gtgcccatcctgcaccgggacctcaagtccagcaacattttgctactt (SEQ ID NO:93) 

V P I L H R D L K S S N I L L L (SEQIDNO:94) 

gtgcccat cctgcaccgggacct caagtccagcaacattttgctactt (SEQ ID NO:95) 

V P I L H R D L K S S N I L L L (SEQ1DN0:96) 



Signature Motif: H R D L K S S 



Consensus Sequence: 

CAC CG(C/G) GAC CT(C/T) AAG TCC AGC 
H R D L K S S 

Complexity: 2^ = 4 members 



Variant 2: 3 members. 

catggtatggtgcatagaaacctggctgcccgaaacgtgctactcaag (SEQ ID NO:97) 

H G M V H R N L A A R N V L L K (SEQIDNO:98) 

aagaattgcatccaccgggacgtggcagcgcgtaacgtgctgttgacc (SEQ ID NO:99) 

K N C I H R D V A A R N V L L T (SEQ ID NO: 100) 

atcaactgcgtgcacagggacat tgctgtccggaacat cctggtggcc (SEQ ID NO:101) 

I N C V H R D I A V R N I L V A (SEQ ID NO: 102) 



Signature Motif: H R D/N I/V/L A A/V R 
Consensus Sequence: 

CA(T/C) (C/A)G(G/A) (G/A)AC (A/C/G) T (T/G) GC(T/A) G(T/C) (C/G) 

H R D/N) (I/V/L) A (A/V) 

CG (A/T/G) 
R 



Complexity: 2* x 3^ = 256 x 9 = 2,304 members 
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Variant 3: 8 members 

atgaactacgtccaccgggaccttcgtgcagccaacatcctggtggga (SEQ ID NO:103) 

M N Y V H R D L R A A N I L V G (SEQIDNO:104) 

atgaactatattcaccgagatcttcgggctgctaatattcttgtagga (SEQ ID NO:105) 

M N Y I H R D L R A A N I L V G (SEQIDNO:106) 

. atgaattat atccatagagatctgcgatcagcaaacattctagtgggg (SEQ ID NO:107) 

M N Y I H R D L R S A N I L V G (SEQIDNO:108) 

atgaactacattcaccgcgacctgagggcagccaacatcctggttggg (SEQ ID NO;109) 

M N Y I H R D L R A A N I L V G (SEQIDNO:110) 

aagaattccatccaccgcgacctgcgggcggccaacatcctggtgtct (SEQ ID NOrlll) 

M N S I H R D L R A A N I L V S (SEQIDN0:112) 

aggaactacatccaccgagacctccgagctgccaacatcttggtctt (SEQ ID N0:113) 

R N Y I H R D L R A A N I L V S (SEQIDN0:114) 

aagaactacattcaccgggacctgcgagcagctaatgttctggtctcc (SEQ ID N0:115) 

K N Y I H R D L R A A N V L V S (SEQIDN0:116) 

cggaattatattcatcgtgaccttcgggctgccaacattctggtgtct (SEQ ID N0:117) 

R N Y I H R D L R A A N I L V S (SEQ ID N0:118) 

Signature Motif: H R D L R A/S A 
Consensus Sequence: 

CA(C/T) (C/A)G(A/C/G/T) GA(C/T) CT(C/G/T) (A/C) G (A/G/T) 
H R D L R 

(G/T) C (A/G/T) GC (A/C/T) " 
(A/S) A 

Complexity: 2^x 3"* x 4= 32 x 81 x 4 = 10,368 members 



Variant 4: 9 M^beis 

ctgcattttgtgcaccgggacctggccacacgcaactgtctagtggg (SEQ ID N0:1 19) 

I» H F V H R D L A T R N C L V G (SEQ ID NO: 120) 

ctcaactttgtacat cgggacctggccacgcggaactgcctagttggg (SEQ ID N0:121) 

L N F V H R D L A T R N C L V G (SEQ ID NO:122) 
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cttaattttgttcaccgagatctggccacacgaaactgtttagtgggt (SBQ ID NO:123) 

L N P V H R D L A T R N C L V G (SEQIDNO:124) 

cgcgggctggtgcaccgagacctcgctacgcgcaacctactgctggcg (SEQ ID NO:125) 

R G L V H R D L A T R N L L L A (SEQIDNO:126) 

aaaaggtatatccacagggatctggcaacgagaaatatattggtggag (SEQ ID NO:127) 

K R Y I H R D L A T R N I L V E (SEQIDNO:128) 

cagcactttgtgcaccgagacctggccaccaggaactgcctggttgga (SEQ ID NO:129) 

Q H F V H R D L A T R N C L V G (SEQIDNO:130) 

cagcacttcgtgcaccgcgatttggccaccaggaactgcctggtcggg (SEQ ID N0:131) 

Q H F V H R D L A T R N C L V G (SEQIDNO:132) 

caccacgtggttcacaaggacctggccacccgcaatgtgctagtgtac (SEQ ID NO: 133) 

H H V V H K D L A T R N V L V Y (SEQIDNO:134) 

cgtaagtttgttcaccgagatttagccaccaggaactgcctggtgggc (SEQ ID NO:135) 

R K F V H R D L A T R N C L V G (SEQIDN0:13Q 



Signature Motif: HR/KDLATR 
Consensus Sequence: 

CAC (A/C) (A/G) (A/C/G) GA(C/T) (C/T) T (A/C/G) GC (A/C/T) 
H R/K D L A 

AC (A/C/G) ( A/C) G (A/C/G) 
T R 

Complexity: 2^ x 3^ 32 x 8 1 = 2,592 members 
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Variants: 61 members 

aagaagcttgtgcaccgcgacctggccgcccgcaacatcctggtctca (SEQ ID NO:137) 
K K L V H R D L A A R N I L V S (SEQ ID NO:138) 

aagaagcttgtgcaccgggacctagccgcccgcaacatcctggtctca (SEQ IDNO:139) . 
K K L V H R D L A A R N I L V S (SEQIDNO:140) 

aacaatttcgtgcat cgagacctggctgcccgcaatgtgctggtgtct (SEQ ID N0:141) 
N N F V H R D L A A R N V L V S (SEQIDNO:142) 

cacgactacatccaccgagacctagccgcgcgcaacgtgctgctggac (SEQ ID NO:143) 
H D Y I H R D L A A R N V L L D (SEQIDNO:144) 

cggcaatacgttcaccgggacttggcagcaagaaatgtccttgttgag (SEQ ID NO:145) 
R Q Y V. H R D L A A R N V L V E (SEQ ID NO: 146) 

cgtcgcttggtgcaccgcgacctggcagccaggaacgtactggtgaaa (SEQ ID NO:147) 
R R L V H R D L A A R N V L V K (SEQIDNO:148) 

cggaacttcatccaccgagacctggctgctcggaattgcatgctggca (SEQ ID NO:149) 
R N F I H R D L A A R N C M L A (SEQIDNO:150) 

aagaagtgcatacaccgagacctggcagccaggaatgtcctggtgaca (SEQ ID N0:151) 
K K C I H R D L A A R N V L V T (SEQ ID NO:152) 

caaaaatgtattcat cgagat t tagcagccagaaatgttttggtaaca (SEQ ID NO:153) 
Q K C I H R D L A A R N V L V T (SEQIDNO:154) 

cagaagtgcatccacagggacctggctgcccgcaatgtgctggtgacc (SEQ ID NO:155) 
Q K C I H R D L A A R N V L V T (SEQIDN0:15Q 

cagaagtgtattcacagagact tggctgccagaaacgtcctggtgacc (SEQ ID NO:157) 
Q K C I H R D L A A R N V L V T (SEQIDNO:158) 

cggaagtgtatccaccgggacctggctgcccgcaatgtgctggtgact (SEQ ID NO:159) 
R K C I H R D L A A R N V L V T (SEQ ID NO:160) 

atgaagctcgttcatcgggacttggcagccagaaacatcctggtagct (SBQ E> N0:161) 
M K L V H R D L A A R N I L V A (SEQIDNO:162) 

agaaagtgcattcatcgggacctggcagcgagaaacattcttttatct (SEQ ID NO:163) 
R K C I H R D L A A R MILLS (SEQIDNO:164) 

cgaaagtgcatccacagagacctggctgctcggaacattctgctgtcg (SEQ ID NO:165) 
RKCIHRDL A A R N I L L S (SEQ ID NO: 166) 
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cgaaagtgtatccacagggacctggcggcacgaaatat cctcttatcg (SEQ ID NO:167) 
R K C I H R D L A A R N I L L S . (SEQIDNO:168) 

aagaactgcgtccacagagacctggcggctaggaacgtgctcatctgt (SEQ ID NO:169) 
K N C V H R D L A A R N V L I C (SEQ IDNO:170) 

aaaaattgtgtccaccgtgatctggctgctcgcaacgt cctcctggca (SEQ ID N0:171) 
K N C V H R D L A A R N V L L A (SEQIDNO:172) 

aagaattgtattcacagagacttggcagccagaaatatccfcccttact (SEQ ID NO:173) 
K N C I H R D L A A R N I L L T (SEQ ID NO: 174) 

aagtcgtgtgttcacagagacctggccgccaggaacgtgcttgtcacc (SEQ ID NO:175) 
K S C V H R D L A A R N V L V T (SEQ IDNO:176) 

aaacagtttattcacagggacctagctgccaggaacattttagttggt (SEQ ID NO:177) 
K Q F I H R D L A A R N I L V G (SEQIDNO:178) 

aagcagt tcatccacagggacctggctgcccggaatgtgctggtcgga (SEQ ID NO:l79) 
K Q F I H R D L A A R N V L V G (SEQ IDNO;180) 

aagcagttcatccacagggacctggctgcccggaatgtgctggtcgga (SEQ ID N0:181) 
K Q F I H R D L A A R N V L V G (SEQ ID NO:182) 

atgaactatgtgcaccgtgacctggctgcccgcaacatcctcgtcaac (SEQ ID NO;183) 
M N Y V H R D L A A R N I L V N (SEQ ID NO: 184) 

atgcatttcattcacagggatctggcagctagaaattgccttgtttcc (SEQ ID NO:185) 
M H F I H R D L A A R N C L V S (SEQ ID NO:186) 

aacaagt ttgtgcaccgagatctagcagcccgcaactgcatggtgtcc (SEQ ID NO:187) 
N K F V H R D L A A R N C M V S (SEQ ID NO:188) 

aataagttcgtccacagagaccttgctgcccggaattgcatggtagcc (SEQ ID NO:189) 
N K P V H R D L A A R N C M V A (SEQ IDNO:190) 

aagaagtttgtgcatcgggacctggcagcgagaaactgcatggtcgcc (SEQ ID N0:191) 
K K F V H R D L A A R N C M V A (SEQ ID NO: 192) 

aagagattcatacaccgggacctggcggccaggaactgcatgctgaat (SEQ ID NO:193) 
K R F I H R D L A A R N C M L N (SEQ ID NO: 194) 

atgaactatgttcaccgtgacctggctgcccgcaacatcctcgtcaac (SEQ ID NO:195) 
M N Y V H R D L A A R N I L V N (SEQ ID NO:196) 

atgaactatgtgcaccgcgacctggctgctcgcaacatccttgtcaac (SEQ ID NO:197) 
M N Y V H R D L A A R N I L V N (SEQ ID NO:198) 
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atcfaattatgtgcatcgggacctggctgctaggaacattctggtcaac (SEQ ID NO:199) 
M N Y V H R D L A A R N I L V N (SEQ E> NO:200) 

atgggctatgtgcatagagatcttgctgccagaaacatcttaatcaac (SEQ ID NO:20I) 
M G y V H R D L A A R N I L I N (SEQ K) NO:202) 

cagaagtttgtgcacagggacctggctgcgcggaactgcatgctggac (SEQ ID NO:203) 
Q K P V H R D L A A R N C M L D (SEQ ID NO:204) 

aaaaagtttgtccacagagacttggctgcaagaaactgtatgctggat (SEQ ID NO:205) 
K K F V H R D L A A R N C M L D (SEQ ID NO:206) 

atgggctatgttcaccgagacctcgctgctcggaacatcttgatcaac (SEQ ID NO:207) 
M G Y V H R D L A A R N I L I N (SEQ ID NO:208) 

aggaattttcttcatcgagatttagctgctcgaaactgcatgttgcga (SEQ ID NO:209) 
R N F L H R D li A A R N C M L R (SEQ ID NO:210) 

aaaaactgtatacacagggaccttgctgcaagaaactgcctggt aggt (SEQ ID N0:21 1) 
K N C I H R D L A A R N C L V G (SEQ ID NO:212) 

aagtgctgcatccaccgggacctggctgctcggaactgcctggtgaca (SEQ ID NO:213) 
K C C I H R D L A A R N C L V T (SEQ ID 1)10:214) 

atgagctatgtgcatcgtgat ctggccgcacggaacatcctggtgaac (SEQ ID NO:215) 
M S Y V H R D L A A R N I L V N (SEQ ID NO:216) 

atgagctacgtccaccgagacctggctgctcgcaacatcctagtcaac (SEQ ID NO:217) 
M S Y V H R D L A A R N I L V N (SEQ ID NO:218) 

atgggctatgttcaccgagacctcgctgctcggaacatcttgatcaac (SEQ ID NO:219) 
M G Y V H R D L A A R N I L I N (SEQ ID NO:220) 

atgggatatgttcacagggaccttgcagctcgcaatattcttgtcaac (SEQ ID NO:221) 
M G Y V H R D L A A R N I L V N (SEQ ID NO:222) 

cgtcgcttggtgcaccgcgacctggcagccaggaacgtactggtgaaa (SEQ ID NO:223) 
R R L V H R D L A A R N V L V K (SEQ ID NO:224) 

gtgcggctcgt acacagggacttggccgctcggaacgtgctggtcaag (SEQ ID NO:225) 
V R L V H R D L A A R N V L V K (SEQIDNO:22Q 

agacgactcgttcatcgggatttggcagcccgtaatgtcttagtgaaa (SEQ ID NO:227) 
R R L V H R D L A A R N V L V K (SEQ ID NO:228) 

aaaaacttcatccacagagatcttgctgcccgaaactgcctggtaggg (SEQ ID NO:229) 
K N F I H R D L A A R N C L V G (SEQ ID NO:230) 
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aagcgctttattcaccgtgacctggctgcccgcaatctgctgttggct (SEQ ID NO:231) 
K R F I H R D L A A R N L L L A (SEQ ID NO:232) 

aagaactttgtgcaccgtgacctggcggcccgcaacgtcctgctggtt (SEQ ID NO:233) 
K N F V H R D L A A R N V L L V (SEQ ID NO:234) 

aagaactt tgtgcaccgtgacctggcggcccgcaacgt cctgctggtt (SEQ ID NO:235) 
K N F V H R D L A A R N V L L V (SEQ E) NO:236) 

agcaattttgtgcacagagatctggctgcaagaaatgtgttgctagtt (SEQ ID NO:237) 
S N F V H R D L A A R N V L L V (SEQ ID NO:238) . 

cagaattacatccaccgggacctggccgccaggaacat cctcgtcggg (SEQ ID NO:239) 
Q N y I H R D L A A R N I L V G (SEQ ID NO:240) 

cagcgcgttgtgcaccgggacttggccgcccggaacgtgctcgtggac (SEQ ID NO:241) 
Q R V V H R D L A A R N V L V D (SEQ ID NO:242) 

cggaactacattcacagagatctggctgccagaaatgt cctcgttggt (SEQ ID NO:243) 
R N Y I H R D L A A R N V L V G (SEQ ID NO:244) 

aagaatttcatccatagagatcttgcagctcgtaactgcctagtggga (SEQ ID NO:245) 
K N F I H R D L A A R N C L V G (SEQ ID NO:24d) 

aacagcttcatccacagagatctggctgccagaaattgt ctagtaagt (SEQ ID NO:247) 
N S F I H R D L A A R N C L V S (SEQ ID NO:248) 

aatggctatattcatagggatttggcggcaaggaattgtttggtcagt (SEQ ID NO:249) 
N G y I H R D L A A R N C L V S (SEQ ID NO:250) 

gcatgtgtcatccacagagacttggctgccagaaattgtttggtggga (SEQ ID NO:251) 
A C V I H R D L A A R N C L V G (SEQ ID NO:252) 

caccaattcatacaccgggacttggctgctcgtaactgcttggtggac (SEQ ID NO:253) 
H Q F I H R D L A A R N C L V D (SEQ ID NO:254) 

aagcagttccttcaccgagacctggcagctcgaaactgtttggtaaac (SEQ ID NO:255) 
K Q F L H R D L A A R N C L V N (SEQ ID NO:256) 

cacaat tatgtccaccgggacctggctgccagaaacatcttggtgaat (SEQ ID NO*^57) 
HNYVHRDL A A R N I L V N (SEQ ID NO:258) 



Signature Motif: H R D L A A R 
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Consensus Sequence: 

CA(C/T) (A/C) G (A/C/G/T) GA(C/T) ( T/C)T(A/C/G/T) GC (A/C/G/T) 
H R D L A ' 

GC (A/C/G/T) ( A/C) G (A/C/G/T) 
A R 

Complexity: 2* x 4* = 16 x 512 = 8,192 members 



Variant 6: 3 members 

agggaagtcatccacaaagacctggctgccaggaactgtgtcattgat (SBQ ID NO:259) 

R E V I H K D L A A R N C V I D (SEQ E) NO:260) 

aaccgctttgtgcat aaggact tggctgcgcgtaactgcctggtcagt (SEQ ID NO'^l) 

N R F V H K D L A A R N C L V S (SEQ ID NO-262) 

cacttctttgtccacaaggaccttgcagctcgcaatattttaatcgga (SEQ ID NO:263) 

H F F V . H K D L A A R N I L I G (SEQ ID NO:264) 

Signature Motif: H K D L A A R 

Consensus Sequence: 

CA(C/T) AA(A/G) GAC ( C/T) T (G/T) GC (A/T) GC(C/G/T) A/C)G(C/G/T) 
H K D L A A R 



Complexity: 2* x 3^ = 64 x 9 = 576 members 



Variant 7: 2 m^bers 

aatcacttcatccacagggatattgccgcccggaactgcctgctgagc (SEQ ID NO:265) 

N H F I H R D I A A R N C L L S (SEQIDNO:26Q 

aaccacttcatccaccgagacattgctgccagaaactgcctcttgacc (SEQ ID NO:267) 

N H F I H R D I A A R N C L L T (SEQ ID NO:268) 
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Signature Motif: H R D I A A R 



Ck>]is^u8 Sequence: 

CAC G(A/G) GA(C/T) ATT GC(C/T) GCC (A/C)G(A/G) 
H R D I A A R 

Complexity: 2^ = 32 members 



Example 3 

Family of human nuclear hormone receptors (ZnF_C4 domain) - 45 members divided 
into 9 groups 

In this example, the 45 known members of the nuclear hormone recq)tor family are 
divided into 9 subgroups. The same segment of the Zinc Finger_C4 domain described in 
Example 1 was used to design individual signature motifs and consensus sequences for each 
of die 9 subgroups. As in Example 1, the consensus sequence was ^'reverse translated" 
utilizing only those codons that encode the signature motif region of known members of the 
subgroup. Division of the family into subgroups dramatically reduces the complexity from 
10,616,832 (see Example 1) to 1,664. 

Variant 1: 9 members 

tataatgcactgacctgtgaggggtgtaaaggtttcttcaggaga (SEQ ID N0:1) 

Y N A L T C E G C K G F F R R (SEQIDN0:2) 

tacggcgtgcgcacctgtgagggctgcaaaggcttctttaagcgc (SEQ ID N0:3) 
y G V R T C E G C K G F F K R (SEQIDN0:4) 

tacggcgtgcgcacctgtgagggctgcaaaggcttctttaagcgc (SEQ ID N0:5) 

Y G V R T C E G C K G F F K R (SEQIDN0:6) 

tacggcgtgcgaacctgcgagggctgcaagggct ttttcaagaga (SEQ ID N0:7) 

Y G V R T C E G C K G F F K R (SEQIDN0:8) 

tatggtgtccgcacatgtgagggctgcaagggcttcttcaagcgc (SEQ ID N0:9) 

Y G V R T C E G C K G F F K R (SEQIDNO:10) 
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tatggagcagtaacttgtgaaggctgcaaaggat t tttt aaaaga (SEQ JD NO: 1 1) 

Y G A V T C E G C K G F F K R (SEQIDN0:12) 

tacggggttatcacctgtgaggggtgcaagggcttcttccgccgg (SEQ ID N0:13) 

Y G V I T C E G C K G F F R R (SEQIDN0:14) 

tacggagtcatcacatgtgaaggctgcaagggattctttaggagg (SEQ ID N0:15) 

Y G V I T C E G C K G F F R R (SEQIDN0:16) 

tatggt gtcattacatgtgaaggctgcaagggctttttcaggaga (SEQ ID N0:17) 

Y G V I T C E G C K G F F R R (SEQIDN0:18) 



Signature Motif; T C E G C K G 



Consensus Sequence: 

A-C- (A/C/T) T-G- (C/T) G-A- (A/G) G-G- (C/G) T-G- (C/T) 
T C B G C 

A-A- (A/G) G-G- (A/C/T) 
K G 

Complexity: 2^ x 3^ = 32 x 9 = 288 



Variant 2: 9 members 

tatggagtgtacagctgcgaggggtgcaagggcttcttcaagcgg (SEQ ID N0:19) 

Y G V Y S C E G C K G P F K R (SEQIDNO:20) 

tacggggtttacagctgtgagggttgcaagggcttcttcaaacgc (SEQ ID N0:21) 

• Y G V Y S C E G C K G F F K R (SBQIDNO:22) 

tacggggtatacagttgtgaaggctgcaaagggttcttcaagagg (SEQ ID NO:23) 

Y G V Y S C E G C K G P F K R (SEQIDNO:24) 

tacaacgtgctcagctgcgaaggctgcaagggcttcttccggcgc (SEQ ID NO:25) 

Y N V L S C E G C K G P F R R (SBQIDN0:2^ 

tacaatgttctgagctgcgagggctgcaagggattcttccgccgc (SEQ ID NO:27) 

Y N V L S C E G C K G F F R R (SEQIDNO:28) 

tatgggatcatctcctgtgagggctgcaaagggtttttcaagcgg (SEQ ID NO:29) 

Y G I I S C E G C K G F F K R (SEQIDNO:30) 
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tatggggtcagctcttgtgaaggctgcaagggcttctttcgccga (SEQ ID N0:31) 

Y G V S S C E G C K G F F R R (SEQIDNO:32) 

tatggggtcagctcttgtgaaggctgcaagggcttctt tcgccga (SEQ.ID NO:33) 

Y G V S S C E G C K G F F R R (SEQIDNO:34) 

tatggggctgtcagttgtgaaggttgcaaaggtttcttcaaaagg (SEQ ID NO:35) 

Y G A V S C E G C K G F P K R (SEQIDN0:3Q 



Signature Motif: s C B G C K G 



CoDSOisus Sequeace: 

(A/T) - (C/G) - (C/T) T-G- (C/T) G-A- (A/G) G-G- (C/G/T) T-G-C 
S C E G C 

A-A- (A/G) G-G- (A/C/G/T) 
KG 

Complexity: 2^x3 x 4 = 64 x 3 x 4 = 768 



Variant 3: 2 members 

tacggtgtcttcacctgcgagggctgcaagagctttttcaagcga (SEQ IDNO:37) 

Y G V F T C B G C K S F F K R (SEQIDNO:38) 

tacggccagttcacgtgcgagggctgcaagagcttcttcaagcgc (SEQ ID NO:39) 

Y G Q F T C B G C K S F F K R (SEQIDNO:40) 



Sigpature Motif: T C E G C K S 



Consensus Sequence: 

A-C- (C/G) T-G-C G-A-G G-G-C T-G-C A-A- (A/G) A-G- (C/T) 

T C B G C K S 

Complexity: 2^ = 8 



Variant 4: 2 members 

tacggggtctacgcc tgcgacggctgctcaggtt tt t tcaaacgg (SEQ ID N0:41) 
Y G V Y A C D G C S G F F K R (SEQIDNO:42) 
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tatggcatctatgcctgcaacggctgcagcggcttcttcaagagg (SEQ ID NO:43) 

Y G I Y A C N G C S G F F K. R (SEQ1DN0:44) 

Sigoature Motif : A C D/N G C S G 
Consensus Sequmce: 

G-C-C T-G-C (A/G) -A-C G-G-C T-G-C (A/T) - (C/G) ■ (A/C) 
A C D/N G C S 

G-G- (C/T) 
G 

Complexity: 2^ = 32 
Variant 5: 2 members 

tatggggcatccacctgtgatgggtgcaagggtttcttcagacgc (SEQ ID NO:45) 

Y G A S T C D G C K G F F R R (SEQ1DN0:46) 

tacggtgcctcgagctgtgacggctgcaagggcttcttccggagg (SEQ ID NO:47) 

Y G A S S C D G C K G F F R R (SEQIDNO:48) 

Sigoature Motif : T/S C D G C K G 
Consensus Sequence: 

A- (C/G) -C T-G-T G-A- (C/T) G-G- (C/G) T-G-C A-A-G 
S/T CD G C K 

G-G- (C/T) 
G 

Complexity: 2"^ 16 
Variant 6: 7 members 

tatggggtcagcgcctgtgagggctgcaagggcttcttccgccgc (SEQ ID NO:49) 

Y G V S A C E G C K G F F R R (SEQIDNO:50) 
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tatggggtcagcgcctgtgagggatgtaagggctttttccgcaga (SEQ TD N0:51) 

Y G V S A C E G C K G F F R R (SEQ1DN0:52) 

tacggtgtgcacgcctgcgagggctgcaagggctttttccgtcgg (SEQ JD NO:53) 

Y G V H A C E G C K G F F R R (SEQIDNO:54) 

tatggagttcatgcttgcgaaggctgtaagggtttctttcggaga (SEQ ID NO:55) 

Y G V H A C E G C K G F F R R (SEQIDN0:5Q 

tacggtgttcatgcatgtgaggggtgcaagggcttcttccgtcgt (SEQ ID NO:57) 

Y G V H A C E G C K G F F R R (SEQIDNO:58) 

tacggagtccacgcgtgtgaaggctgcaagggcttctttcggcga (SEQ ID NO:59) 

Y G V H A C E G C K G F P R R (SEQIDNO:60) 

tatggagttcatgcttgtgaaggatgcaagggtttcttccggaga (SEQ ID N0:61) 

Y G V H A C E G C K G F F R R (SEQIDNO:62) 

Signature Motif : A C E G C K G 



Consensus Sequence: 

G-C- (A/C/G/T) T"G- (C/T) G-A- (A/G) G-G- (A/C/G) T-G- (C/T) 
A C E G C 

A-A-G G-G- (C/T) 
K G 

Complexity: a'^x 3 x 4= 16 x 12 = 192 



Variant 7: 6 members 

ttcaatgtcatgacatgtgaaggatgcaagggctttttcaggagg (SEQ ID NO:63) 

F N V M T C E G C K G F F R R (SEQIDNO:64) 

tttaatgcgctgacttgtgagggctgcaagggtttcttcaggaga - (SEQ ID NO:65) 

F N A L T C E G C K G F F R R (SEQIDNO:66) 

taccgctgtatcacgtgtgaaggctgcaagggtttctttagaaga (SEQ ID NO:67) 

Y R C I T C E G C K G F F R R (SEQIDNO:68) 

taccgctgtatcacttgtgagggctgcaagggcttcttt cgccgc (SEQ ID NO:69) 

Y R C I T C E G C K G F F R R (SEQIDNO:70) 
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tacggactgctcacgtgtgagagctgcaagggcttcttcaagcgc (SEQ ID N0:71) 

Y G . Ii L T C B S C K G F F K R (SBQIDNO:72) 

tatgggctcctcacctgtgaaagctgcaagggattttttaagcga (SEQ ID NO:73) 

Y G L L T C E S C K G F F K R (SEQIDNO:74) 

Sigaature Motif: TCEG/SCKG 



Consensus Sequence: 

A-C- (A/C/G/T) T-G-T G-A- (A/G) ( A/G) -G- (A/C) T-G-C A-A-G 
T C E G/S C K 

G"G(A/C/T) 
G 

Complexity: 2^x3x4^=8x3x4 = 96 



Variant 8: 4 members 

tatggggtagtcacctgtggcagctgcaaagttttcttcaaaaga • (SEQIDNO:75) 

Y G V V T C G S C K V F F K R (SEQIDNO:76) 

tatggagctctcacatgtggaagctgcaaggtcttcttcaaaaga (SEQ ID NO:77) 

Y G A L T C G S C K V F F K R (SEQIDNO:78) 

tatggtgtccttacctgtgggagctgtaaggtcttctttaagagg (SEQ ID NO:79) 

Y G V L T C G S C K V P F K R (SEQIDNOiSO) 

t atggagtcttaacttgtggaagctgtaaagttttctt caaaaga (SEQ ID N0:81) 

Y G V L T C G S C K V F F K R (SEQIDNO:82) 

Signature Motif: T C G S C K V 



Consensus Sequence: 

A-C- (A/C/T) T-G-T G-G- (A/C/G) A-G-C T-G- (C/T) A- A- (A/G) 
T C G S C K 

G-T- (C/T) 
V 

Complexity: 2^ x 3^= 8 x 9 = 72 
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Variant 9: 4 members 

tacggcgtggcctcctgcgaggcttgcaaggccttcttcaagagg (SEQn>NO:83) 

Y G V A S C E A C K A F F K R (SEQIDNO:84) 

tatggtgtggcatcctgtgaggcctgcaaagccttcttcaagagg (SEQ ID NO:85) 

Y G V A S C E A C K A F F K R (SEQ ID N0:8^ 

tatggagtctggtcctgtgagggctgcaaggccttcttcaagaga (SEQ ID NO:87) 

Y G V W S C E G C K A F F K R (SEQIDNO:88) 

tatggagtctggtcgtgtgaaggatgtaaggccttttttaaaaga (SEQ ID NO:89) 

Y G V W S C E G C K A F F K R (SEQIDNO:90) 

Signature Motif: SCEA/GCKA 



Consensus Sequence: 

T-C- (C/G) T-G- (C/T) G-A- (A/G) G- (C/G) - (A/C/T) T-G- (C/T) 
S • C B A/G C 

A-A- (A/G) G-C-C 
K A 

Complexity: 2^x 3 = 64 x 3 = 192 

Total Complexity of library: the sum of the compl^ties of subgroups 1-9 = 1,664. 



The library is constructed from Qie following semi-randomized oligonucleotides: 
Variant 1 (SEQ ID NO:269) 

5 ' -pCCAGGACGACAAAAAGACHTGYGARGGSTGYAARGGHCTTTTTAGGCTTTTCGG-3 ' 

Variant 2 (SEQ ID NO:270) 

5' -pCCAGGACGACaAATVAGWSYTGyGftRGGBTGOUXRGCaTCTTTTTAGGCTT^ 

Variants (SEQ ID NO:271) 

5 ' -pCCAGGACGAOUUUlAGACSTGCGAGGGCTGCAMlAGYCTTTTTAGGCTTTTCGG-a ' 
Variant 4 (SEQ ID NO:272) 

5 ' -pCaVGGAa3ACAAAAAGCCTGCaiACGGCrGCWSMGGYCTTTTTAGGCTTTTCGG-3 ' 
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Variants (SEQ ID NO:273) 

5 ' -pCCAGGACGACAAAAAG2^CTGTGAYGGSTGCAAGGGYCTTTTTAGGCTTTTCGG-3 

Variant 6 (SEQ ID NO:274) 

5' -pCCAGGACGACAAAAAGCMTGYGARGGWGyAAGGGYCTTTTTAGGCTTTTCG6-3 
Variant? (SEQ ID NO:275) 

5 ' -pCa^GGACGACAAAAAGACNTGTGARRGMTGCAAGGGHCTTTTTAGGCTTTTCGG-3 ' 
Variant 8 (SEQ ID NO:276) 

5 ' -pCa^QGAC6ACAJAAAGACHTOTGGVAGCr6YAARGTYCTTTTTAGGCTTTTCGG-3 
Variant 9 (SEQ ID NO:277) 

5' -PCCAGGACGACAAAAAGTCSTGYGARGSHT6YAARGCCTTTTTAGGCTTTTCGG-3 

In the above, mixtures of nucleotides (wobbles) are denoted using the following standard 
nomraclature: 



Table 2 



Wobble 


Nucleotides 


B 


C+CH-T 


D 


A+G+T 


H 


A+C+T 


K 


G+T 


M 


A+C 


N 


A+C+G+T 


R 


A+G 


S 


C+G 


V 


A+C+G 


W 


A+T 


Y 


C+T 



The semi-randomized oligonucleotides are resuspended in TE buffer and combined in 
direct proportion to their complexities to a final concentration of 0.92 jiM. One hundred 
eight pmol of the semi-randomized oligonucleotide mixture is combmed witii 21.6 pmol 
each of adq)ter oligonucleotides Univ-l(FseI) andUniv-2(AscI). 
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Univ-lfFsel) : 5 ' - CTTTTTGTCGTCCTGGCCGG- 3 ' (SEQ ID NO:278) 
Umv-2(Ascn : 5 ' -pCGCGCCGAAAAGCCTAAAAAG- 3' (SEQ ID NO:279) 

The oligonucleotides are annealed by heating to 70 "C fior 5 minutes and slowly 
cooling to room temperature (~3 hours). The annealed oUgonucleotides are ligated to 0.216 
pmol of an Fsel/AscI-digested vector bearing opposing hranan U6 and murine U6 promoters. 
Construction of this vector is described in U.S. Patent Application Serial Numba: 
10/626,512. The nucleotide sequence of tiie human U6 and murine U6 promoters between 
the TATA box and the transcription start site was modified to contain Fsel and AscI 
restriction sites, respectively, as indicated below: 

Hnman U6/murine U6 Opposing Promoter Cassette 
(Fsel and AscI sites in lower case letters): 

GGATCCAAGCTTAAGGTCGGGCAGGAAGAGGGCCTATTTCCCATGATTCC 

TTCyVTATTTGCATATACGATAOVAGGCTOTTAGAGAGATAATTAGAATTA 

ATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAG 

TAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGA 

CTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTAT 

ATATCggccggccTCGAggcgcgccATATTTATAGTCTCS^AAACACACAA 

TTACTTTACAGTTAGGGTGAGTTTCCTTTTGTGCTGTTTTTTAAAATAAT 

AATTTAGTATTTGTATCTCTTATAGAAATCC3A6CCTATCATGTAAAATG 

TAGCTAGTATTAAAAAGAACaGATTATCTGTCTTTTATCGCACATTAAGC 

CTCTATAGTTACTAGGAAATATTATATGCAAATTAACCGGGGCAGGGGAG 

TAGCCGAGCTTCTCCCACAAGTCTGTGCGAGGGGGCCGGCGCGGGCCTAG 

AGATGGCGGCGTCGGATCC (SEQ ID NO:280) 

Ligation is performed overnight at 16 »C. One-fiifli of tiie ligation reaction is used 
to transform electrocompetent bacteria (DH12S), resulting in lO" - lO' cfu/fig DNA. 

Hie relatively low complexity (1,664) permits ttie deUvery of the resulting library to 
ttie host cells by ti^ient transfection in a 96-weU format The Ubrary is arrayed by picking 
-4,000 individual colonies and inoculating 750 |il/weU of TB media (containing appropriate 
antibiotics) in 2-ml deep weU 96-well plates (VWR). FoUowing incubation for 20 hours, the 
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cultures are pooled in groups of 10. DNA minipreps (Qiaprep Spin Miniprep Kits, Qiagen) 
are prepared from 1.5 ml of pooled bacterial culture. (The remainder of each culture is 
aUquotted and frozen for friture use.) The purified DNA from each pool is quantitated using 
Rediplate 96 PicoGieen dsDNA Quantitation Kits (Molecular Probes). DNA from each pool 
is diluted to 100 ng/^1 and stored in 96-well plates. Each well contains DNA encoding up to 
10 unique siSNAs. Transfection of target cells is performed in a 96-well format using 
standard methods. 
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WHAT IS CLAIMED IS: 

1 . A method for generating an siRNA expression library for selective 
post-transcriptional silmcing of genes encoding a family of proteins, the method comprising: 

1. identifying a consensus sequence for the family of proteins; and, 

ii. generating an siBNA expression library whose membm encode siBNA 
molecules that target at least all mRNA encoding all known members of the &mily of 
proteins. 

2. The method of claim 1 , wherein the consensus sequence comprises 
between IS to 30 nucleotides. 

3. The method of claim 1, wherein the consensus sequence comprises 
between 18 to 24 nucleotides. 

4. The method of claim 1, wherein the library comprises between SO and 
one n:iiliion unique members. 

5. The method of claim 1, wherein the library comprises between 20,000 
and 100,000 unique members. 

6. The method of claim 1 , wherein the family of proteins is selected from 
the group consisting of: G protein coupled recqptors, ion channels, receptor tyrosme kinases, 
non-receptor tyrosine kinases, nuclear hormone receptors, GTPases, ATPases, 
serine/threonine kinases, proteases, matrix metalloproteinases (MMPs), GTPase-activating 
proteins (GAPs) and E3 ubiquitin ligases. 

7. The method according to claim 1 wherein the step of identifying a 
consensus sequence comprises identifying at least one signature motif for the family of 
proteins. 

8. The method according to claim 1 wherein the step of identifying a 
consensus sequence comprises identifying two or more variants of a signature motif for the 
fionily of proteins. 
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9. An siSNA expression library for selective post-transoriptional 
silking of genes encoding a family of proteins, wherein members of the library encode 
siBNA molecules that are of between IS to 30 nucleotides in length and target at least all 
mRNA encoding all known members of the family of proteins, and wherein the library 
comprises up to one million unique members. 

10. The library of claim 9, wherein the library comprises up to 100,000 
unique members. 

1 1 . The library of claim 9, wherem the family of proteins is selected from 
the group consisting of: G protein coupled receptors, ion channels, receptor tyrosine kinases, 
non-receptor tyrosine kinases, nuclear homione receptors, GTPases, ATPases, 
serine/threonine kinases, proteases, matrix metalloproteinases (MMPs), GTPase-activating 
proteins (GAPs) and £3 ubiquitin ligases. 

12. The library of claim 9, wherein the siRNA molecules are between 1 8 
to 24 nucleotides in length. 
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